Mitochondrial genome of Garcinia mangostana L. variety Mesta

Fruits of Garcinia mangostana L. (mangosteen) are rich in nutrients with xanthones found in the pericarp having great pharmaceutical potential. Mangosteen variety Mesta is only found in Malaysia, which tastes sweeter than the common Manggis variety in Southeast Asia. In this study, we report the complete mitogenome of G. mangostana L. variety Mesta with a total sequence length of 371,235 bp of which 1.7% could be of plastid origin. The overall GC content of the mitogenome is 43.8%, comprising 29 protein-coding genes, 3 rRNA genes, and 21 tRNA genes. Repeat and tandem repeat sequences accounted for 5.8% and 0.15% of the Mesta mitogenome, respectively. There are 333 predicted RNA-editing sites in Mesta mitogenome. These include the RNA-editing events that generated the start codon of nad1 gene and the stop codon of ccmFC gene. Phylogenomic analysis using both maximum likelihood and Bayesian analysis methods showed that the mitogenome of mangosteen variety Mesta was grouped under Malpighiales order. This is the first complete mitogenome from the Garcinia genus for future evolutionary studies.


Results
General features of Garcinia mangostana var. Mesta mitogenome. De novo assembly using Orga-nelle_PBA generated two mitogenome contigs with the length of 389,277 bp (scf7180000000010) and 20,340 bp (scf7180000000011), respectively. The smaller contig was the subset of the larger (master) contig (Supplementary Figure S1). For the larger contig that was circular, manual curation was done by removing one of the identical ends (~ 18 kb) and a total of 63 bases were added based on the detected variants when short reads were aligned to the trimmed mitogenome (Supplementary Figure S2). The final complete mitogenome of Garcinia mangostana var. Mesta was 371,235 bp which was slightly larger than Arabidopsis thaliana mitogenome (367,808 bp) but smaller than Carica papaya mitogenome (476,890 bp) ( Table 1). The average Mesta mitogenome coverage was 129 × using PacBio subreads (Supplementary Figure S3). The mitogenome comprising 29 protein-coding genes, 3 rRNA genes (rrn5, rrn18, and rrn26), and 21 tRNA genes ( Fig. 1 & Table 2). The total length of protein-coding genes was 28,113 bp, which accounted for 7.6% of the mitogenome. There were only five ribosomal proteins (rpl5, rpl10, rpl16, rps3, and rps4) found in the Mesta mitogenome.

Comparison of mitogenome gene content of different species.
Comparison of the mitogenomes gene content of different species (Fig. 2 and Supplementary Table S1) showed that all the mitogenomes encoded the basic core set of 24 protein-coding genes (complex I, III, IV, V, and cytoplasmic membrane proteins). However, the mitogenome of Passiflora edulis encoded two copies of the genes ccmB, nad4L, nad6, and nad7 and four copies of the gene cox2. Mitogenome of Arabidopsis thaliana also encoded two copies of atp6 gene with the length of 1158 bp and 1050 bp ( Fig. 2 and Supplementary Table S1), respectively. Two copies of ccmFN gene were also found in the mitogenome of Carica papaya. The nad1 gene in Mesta was found only consisted of 3 exons instead of 5 as observed in other species despite the total length of its CDS sequence being almost like the other species. Both genes, sdh3 and sdh4, which were mitochondrial complex II were not found in Mesta mitogenome. Most of the ribosomal proteins were found missing in both Salicaceae and Clusiaceae families from the order Malpighiales. Noticeably, gene rps12 was missing from Mesta but found in other species in the comparison.
Distribution of tRNAs. The 21 tRNAs identified in Mesta mitogenome only code for 14 amino acids (Ser, Phe, Asn, Met, Pro, Gly, Lys, Gln, Tyr, His, Trp, Asp, Glu, Cys). Two out of 21 tRNAs had a chloroplast-origin while the rest were mitochondrial-origin. However, the tRNA genes code for the other six amino acids (Leu, Ile, Thr, Ala, Val, Arg) were not detected. Among the 21 tRNAs, one of them was predicted to have one intron (Supplementary Table S2).
Plastome-derived sequences. There were five plastome-derived sequences with an identity of more than 80% and a sequence length of at least 100 bp (Table 3) found in the mitogenome. This accounted for a total length of 6214 bp which was 1.7% of the mitogenome. The plastome genes contained were rpl2 (partial), rpl23, trnl-CAU , ndhA (partial), ndhH, rps15 (partial), atpE (partial), atpB, rps3 (partial), and trnD-GUC . Three of the fragments were found at the plastome large single-copy (LSC) region and one from the single-copy (SSC) region and inverted repeat (IR) regions, respectively.
A total of 333 putative RNA-editing sites had been predicted using PREP-MT (Supplementary Table S4). Among the CDS, gene nad4 contained the highest number of predicted editing sites (35) while there was no editing site found at the atp9 gene. A total of three annotated genes did not start with the start codon ATG (atp6, nad1, and rpl16) (Supplementary Table S3). Among them, the ACG site at the beginning sequence of nad1 was putatively predicted to be one of the editing sites that converted it into the ATG start codon. Similarly, early termination was predicted at the gene ccmFC sequence which converted the CGA into the stop codon TGA.   Phylogenomic analysis. A total of 22 (excluding ccmFn and mttB) out of 24 basic core sets of proteincoding genes (Table S7) were used for phylogenomic analysis. Two methods were used for phylogenomic analysis of mitogenome: Maximum likelihood (ML) and Bayesian phylogenetic analysis. ML analysis separated the 15 species into different groups based on the order (Fig. 3). In comparison, the tree topology of both methods was almost identical except for R. communis (Figs. 3 and 4). Nevertheless, both methods grouped Mesta under Malpighiales order together with P. edulis, S. suchowensis, S. purpurea, P. alba, P. tremula, and P. davidiana.

Discussion
Due to the high complexity of plant mitogenome with large repetitive regions, long-read sequencing is superior in mitogenome assembly 13,22 . In this study, a total of two Mesta mitogenome contigs were obtained using PacBio data. The shorter contig was a subset of the longer one with the size of 371,235 bp (after manual curation) and was considered as the complete Mesta mitogenome. It is not uncommon to have multiple mitogenome contigs in plants to exist in both circular and linear structures due to intramolecular recombination events 12,23,24 . For instance, there were 10 contigs in Fagopyrum esculentum 25 and 13 contigs in Picea sitchensis 22 . Mesta mitogenome encoded the basic core set of 24 protein-coding genes 26 commonly found in plant mitogenome 27 . However, Mesta mitogenome size was relatively small compared to other plant mitogenomes 28 from the same order, Malpighiales (Table 1), due to the reduced number of ribosomal proteins and missing genes encoding respiratory chain complex II, sdh3 and sdh4 (Fig. 2 and Supplementary Table S3). These proteincoding genes could be lost during evolution and might be transferred to the nuclear genome as observed in other angiosperm mitogenomes 9,29-31 such as S. latifolia 32 , S. noctiflora 33 , P. dactylifera, and A. indica 27,30 . For instance, rps12 was not found in Mesta mitogenome as well as Oenothera and Zostera marina 34,35 .
A complete set of tRNAs coding for 20 amino acids is required for protein translation in plant mitogenomes. However, currently, there was no complete set of tRNA genes found in the mitogenome of angiosperm 36 . For the Mesta mitogenome, a total of six amino acids encoded by tRNAs (Leu, Ile, Thr, Ala, Val, Arg) were not detected and these tRNAs types were generally reported missing in angiosperm mitogenomes 36 . In the mitogenome of S.  There are several factors attributed to the differences of plant mitogenome lengths, including the integration of nuclear and plastid genomes as well as the number and length of non-coding regions 26 . Chloroplast sequences can be found in plant mitogenomes as there were integration events during evolution. The integration of plastome sequences into mitogenomes can range from 1 to 12% 38 . For the Mesta mitogenome, the integration rate was 1.7% which was smaller compared to other plants such as watermelon (7.6%) 39 , P. dactylifera (10.3%) 27 , and C. pepo (11.6%) 40 .
The repetitive regions found in the intergenic regions of mitogenome were in variable types such as short repeats, tandem repeats, and long complex repeats 27,28,41,42 . The large repeats (> 1 kb) might cause homologous recombination and eventually lead to the different configuration of the mitogenomes 43 . Apart from large repeats, both direct and inverse repeats also contribute to the subgenomic molecules 6 . Repeats detected in Mesta mitogenome using web-based REPuter was low (5.8%) compared to C. melo (42.7%) 44 , V. vinifera (6.8%) 45 , and N. colorata (48.89%) 15 but higher than P. dactylifera (2.3%) 27 . Similarly, the tandem repeats detected in Mesta mitogenome were also low which was 0.15% compared to 0.33% in P. dactylifera 27 .
RNA-editing events are essential in plant development and stress response 46 . The most common RNA editing events in plant organelles (mitochondria and plastids) were the conversion of C-to-U 47 . RNA editing can lead to the start codon/stop codon generation, eliminate premature stop codon, change the splicing site, affect the RNA structure, and cause instability of RNAs 46 . It is predicted that RNA-editing events generated the start codon in nad1 and the stop codon in ccmFC genes of Mesta mitogenome. The start codon of nad1 gene in several species such as A. alpina 48 , B. stricta 49 , and C. rubella 50 was also formed by RNA-editing. On the other hand, stop codon prediction in ccmFC gene sequence had also been reported in A. thaliana and C. bursa-pastoris 1 . The GTG in rpl16 might be a translation start codon as similar observations were found in maize, Marchantia, and Petunia mitogenomes 51 .
For phylogenomic analysis, 22 protein-coding genes found in all mitogenomes were used to study the evolutionary relationships among the different species under Malpighiales, Brassicales, Cucurbitales, Rosales, Fabales, and Vitales. Despite one taxon (R. communis) that was not clustered into the Malpighiales order in the Bayesian analysis, both ML and Bayesian analyses grouped the Mesta under Malpighiales order. This observation was in concordance with the mangosteen plastome evolutionary study 19 . However, plastome could not resolve the G. mangostana species into different cultivars (Manggis and Mesta) due to the same protein-coding genes sequences 19 . Hence, to verify whether mitogenomes provide a better resolution than plastome in the phylogenetic study of Clusiaceae family, especially mangosteen species of different cultivars, extensive mitogenome assembly are needed for future comparison among different relatives and cultivars.

Conclusion
The complete mitogenome of Mesta was successfully assembled. The Mesta mitogenome length was relatively smaller than other species in the same order due to the loss of most ribosomal proteins and both sdh genes. Phylogenomic analysis based on the 22 protein-coding genes among the 15 selected species showed that Mesta was clustered within the Malpighiales order. The mitogenome can serve as a good reference to study the regulation of the mitogenome genes.

Materials and methods
Mitochondrial genome assembly. Genome sequences of Mesta variety were obtained from the NCBI SRA database with the accession numbers SRX2718652 to SRX2718659 for PacBio long-read data (9.5 Gb) 20 and SRX270978 for Illumina short reads (50.2 Gb) 21 . CANU v2.0 52 was used to perform PacBio raw data correction and trimming using default parameters. Next, non-mitogenome reads were removed by sequence alignment of each read against a Carica papaya mitogenome (accession no. NC_012116.1) which was used as the reference genome. Then, de novo assembly was performed using Organelle_PBA software 53 . Manual curation was performed to obtain the complete Mesta mitogenome.
Genome annotation. Gene annotation was conducted using both GeSeq 54 and MITOFY web server 40 .

Data availability
The complete mitogenome sequence of Garcinia mangostana var. Mesta has been submitted to GenBank with the accession number OM759996.